183 research outputs found
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Reinforcement learning serves as a potent tool for modeling dynamic user
interests within recommender systems, garnering increasing research attention
of late. However, a significant drawback persists: its poor data efficiency,
stemming from its interactive nature. The training of reinforcement
learning-based recommender systems demands expensive online interactions to
amass adequate trajectories, essential for agents to learn user preferences.
This inefficiency renders reinforcement learning-based recommender systems a
formidable undertaking, necessitating the exploration of potential solutions.
Recent strides in offline reinforcement learning present a new perspective.
Offline reinforcement learning empowers agents to glean insights from offline
datasets and deploy learned policies in online settings. Given that recommender
systems possess extensive offline datasets, the framework of offline
reinforcement learning aligns seamlessly. Despite being a burgeoning field,
works centered on recommender systems utilizing offline reinforcement learning
remain limited. This survey aims to introduce and delve into offline
reinforcement learning within recommender systems, offering an inclusive review
of existing literature in this domain. Furthermore, we strive to underscore
prevalent challenges, opportunities, and future pathways, poised to propel
research in this evolving field.Comment: under revie
Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation
Deep reinforcement learning (DRL) has been proven its efficiency in capturing
users' dynamic interests in recent literature. However, training a DRL agent is
challenging, because of the sparse environment in recommender systems (RS), DRL
agents could spend times either exploring informative user-item interaction
trajectories or using existing trajectories for policy learning. It is also
known as the exploration and exploitation trade-off which affects the
recommendation performance significantly when the environment is sparse. It is
more challenging to balance the exploration and exploitation in DRL RS where RS
agent need to deeply explore the informative trajectories and exploit them
efficiently in the context of recommender systems. As a step to address this
issue, We design a novel intrinsically ,otivated reinforcement learning method
to increase the capability of exploring informative interaction trajectories in
the sparse environment, which are further enriched via a counterfactual
augmentation strategy for more efficient exploitation. The extensive
experiments on six offline datasets and three online simulation platforms
demonstrate the superiority of our model to a set of existing state-of-the-art
methods
Contrastive Counterfactual Learning for Causality-aware Interpretable Recommender Systems
There has been a recent surge in the study of generating recommendations
within the framework of causal inference, with the recommendation being treated
as a treatment. This approach enhances our understanding of how recommendations
influence user behaviour and allows for identification of the factors that
contribute to this impact. Many researchers in the field of causal inference
for recommender systems have focused on using propensity scores, which can
reduce bias but may also introduce additional variance. Other studies have
proposed the use of unbiased data from randomized controlled trials, though
this approach requires certain assumptions that may be difficult to satisfy in
practice. In this paper, we first explore the causality-aware interpretation of
recommendations and show that the underlying exposure mechanism can bias the
maximum likelihood estimation (MLE) of observational feedback. Given that
confounders may be inaccessible for measurement, we propose using contrastive
SSL to reduce exposure bias, specifically through the use of inverse propensity
scores and the expansion of the positive sample set. Based on theoretical
findings, we introduce a new contrastive counterfactual learning method (CCL)
that integrates three novel positive sampling strategies based on estimated
exposure probability or random counterfactual samples. Through extensive
experiments on two real-world datasets, we demonstrate that our CCL outperforms
the state-of-the-art methods.Comment: conferenc
- …